Statistical-Computational Tradeoffs in Planted Models: The High-Dimensional Setting

نویسندگان

  • Yudong Chen
  • Jiaming Xu
چکیده

The planted models assume that a graph is generated from a set of clusters by randomly placing edges between nodes according to their cluster memberships; the task is to recover the clusters given the graph. Special cases include planted clique, planted partition and planted coloring. This paper studies the statisticalcomputational tradeoffs of these models. Our focus is the high-dimensional setting, where the number of clusters is allowed to grow with the number of nodes. We show that the complexities of cluster recovery exhibit phase transitions. In particular, the space of model parameters can be partitioned into four regions with decreasing statistical and computational complexities: (1) the impossible regime, where all algorithms fail; (2) the hard regime, where the exponential-time Maximum Likelihood Estimator (MLE) succeeds; (3) the easy regime, where a polynomial-time convexified MLE succeeds; (4) the simple regime, where a simple algorithm based on counting degrees and common neighbors succeeds. Moreover, each of these algorithms is likely to fail in the harder regime.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting

The planted models assume that a graph is generated from some unknown clusters by randomly placing edges between nodes according to their cluster memberships; the task is to recover the clusters given the graph. Special cases include planted clique, planted partition, planted densest subgraph and planted coloring. Of particular interest is the high-dimensional setting where the number of cluste...

متن کامل

Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices

We consider two closely related problems: planted clustering and submatrix localization. In the planted clustering problem, a random graph is generated based on an underlying cluster structure of the nodes; the task is to recover these clusters given the graph. The submatrix localization problem concerns locating hidden submatrices with elevated means inside a large real-valued random matrix. O...

متن کامل

Sharp Computational-Statistical Phase Transitions via Oracle Computational Model

We study the fundamental tradeoffs between computational tractability and statistical accuracy for a general family of hypothesis testing problems with combinatorial structures. Based upon an oracle model of computation, which captures the interactions between algorithms and data, we establish a general lower bound that explicitly connects the minimum testing risk under computational budget con...

متن کامل

Statistical and Computational Tradeoffs of Regularized Dantzig-type Estimator∗

Nesterov’s smoothing technique has been widely applied to solve non-smooth optimization problems involving high dimensional statistical models. However, existing theory focuses more on its computational properties rather than statistical properties. This paper bridges this gap by studying a family of regularized Dantzig-type estimators. For these estimators, we show that the smoothing technique...

متن کامل

Finding and Leveraging Structure in Learning Problems

The problem of learning from noisy and high dimensional data is an important challenge that has received much attention in the modern machine learning and statistics literature. These problems arise in numerous applications: large scale collaborative filtering, learning gene regulatory networks and genome wide association studies to name a few. This thesis focuses on understanding the statistic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013